{
 "cells": [
  {
   "cell_type": "markdown",
   "source": [
    "# Example of ZhipuAI GLM-4 Vision Capabilities\n",
    "\n",
    "**This tutorial is available in English and is attached below the Chinese explanation**\n",
    "\n",
    "此代码将使用几个经典的案例展现GLM-4V的视觉理解能力。这个教程将展示如何使用Python SDK调用GLM-4V API来完成简单的视觉理解和分析工作。\n",
    "\n",
    "This cookbook will describe some classic cases of GLM-4V's vision understanding capabilities. This tutorial will show you how to use the Python SDK to call the GLM-4V API to complete simple visual understanding and analysis tasks."
   ],
   "metadata": {
    "collapsed": false
   },
   "id": "b9d2202d7ce04a48"
  },
  {
   "cell_type": "markdown",
   "source": [
    "# 1. Set up the API key\n",
    "\n",
    "首先，设定好调用模型的API key\n",
    "First, set the API key for calling the model"
   ],
   "metadata": {
    "collapsed": false
   },
   "id": "284516f23cb3d30f"
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "os.environ[\"ZHIPUAI_API_KEY\"] = \"your zhipuai api key\""
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-05-26T08:23:19.254870Z",
     "start_time": "2024-05-26T08:23:19.250393Z"
    }
   },
   "id": "f03a65879fc8edb8"
  },
  {
   "cell_type": "markdown",
   "source": [
    "# 2. set up the client and upload the image\n",
    "\n",
    "我们书写了一个简单的对话程序，让您能和 GLM-4V 进行简单的对话，并感受 GLM-4V的能力。\n",
    "\n",
    "We have written a simple conversation program that allows you to have a simple conversation with GLM-4V and feel the capabilities of GLM-4V."
   ],
   "metadata": {
    "collapsed": false
   },
   "id": "524d71278f724da1"
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "outputs": [],
   "source": [
    "import base64\n",
    "import io\n",
    "from PIL import Image\n",
    "from zhipuai import ZhipuAI\n",
    "\n",
    "client = ZhipuAI()\n",
    "\n",
    "\n",
    "def image_to_base64(image_path):\n",
    "    \"\"\"\n",
    "    Convert an image to base64 encoding.\n",
    "    \"\"\"\n",
    "    with Image.open(image_path) as image:\n",
    "        buffered = io.BytesIO()\n",
    "        image.save(buffered, format=\"JPEG\")  # or format=\"PNG\", depending on your image.\n",
    "        img_str = base64.b64encode(buffered.getvalue()).decode()\n",
    "    return f\"data:image/jpeg;base64,{img_str}\"\n",
    "\n",
    "\n",
    "def analyze_image(image_path, question):\n",
    "    \"\"\"\n",
    "    Analyze an image based on the provided question.\n",
    "    \n",
    "    Parameters:\n",
    "        image_path (str): The path to the image file.\n",
    "        question (str): The question to ask about the image.\n",
    "        \n",
    "    Returns:\n",
    "        str: The answer to the question.\n",
    "    \"\"\"\n",
    "    base64_image = image_to_base64(image_path)\n",
    "\n",
    "    messages = [\n",
    "        {\n",
    "            \"role\": \"user\",\n",
    "            \"content\": [\n",
    "                {\n",
    "                    \"type\": \"text\",\n",
    "                    \"text\": question\n",
    "                },\n",
    "                {\n",
    "                    \"type\": \"image_url\",\n",
    "                    \"image_url\": {\n",
    "                        \"url\": base64_image\n",
    "                    }\n",
    "                }\n",
    "            ]\n",
    "        }\n",
    "    ]\n",
    "\n",
    "    response = client.chat.completions.create(\n",
    "        model=\"glm-4v\",\n",
    "        messages=messages,\n",
    "        temperature=0.1,\n",
    "        top_p=0.1\n",
    "    )\n",
    "\n",
    "    return response.choices[0].message.content"
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-05-26T08:23:19.558999Z",
     "start_time": "2024-05-26T08:23:19.255902Z"
    }
   },
   "id": "4f3135b451b85400"
  },
  {
   "cell_type": "code",
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "9号西瓜看起来是最熟的。因为9号西瓜的颜色是黄绿色的，这通常表明西瓜已经完全成熟。而其他西瓜仍然是深绿色，这意味着它们可能还没有完全成熟。\n",
      "======================\n",
      "图片中的文字是：亲爱的，我想告诉你，只要我们用心去温暖我们的每一天，幸福就会照耀我们。\n",
      "亲爱的，我想告诉你，只要我们可以行动每天给自己一点提升成功就会光顾我们。\n",
      "这两段文字的寓意是：告诉我们要珍惜每一天，努力提升自己，相信成功会到来。\n",
      "======================\n",
      "这幅漫画揭示了科学研究中一个普遍存在的问题：数据准备和清理工作占据了研究人员大量的时间和精力，而真正用于创新性研究和分析的时间却相对较少。这种现象是由于当前的研究评价体系往往更注重发表论文的数量和质量，而对研究过程的复杂性和挑战性认识不足。\n",
      "\n",
      "这幅漫画呼吁学术界更加重视研究过程中的实际问题，并鼓励政策制定者和资助机构调整评价标准，以反映研究的全貌。同时，它也提醒研究者们关注研究方法的可持续性和可重复性，以便更好地利用时间和资源进行创新性的研究工作。\n",
      "======================\n",
      "这张流程图描述的是一个由盐水蒸馏得到纯净水的实验过程。\n"
     ]
    }
   ],
   "source": [
    "print(analyze_image(\"data/watermelon.jpeg\", \"哪个西瓜看起来最熟，请给出序号并说明理由\"))\n",
    "print(\"======================\")\n",
    "print(analyze_image(\"data/handwrite.jpg\", \"输出这张图的文字，并分析文字的寓意\"))\n",
    "print(\"======================\")\n",
    "print(analyze_image(\"data/cartoon.jpeg\", \"这张漫画的寓意\"))\n",
    "print(\"======================\")\n",
    "print(analyze_image(\"data/flow.png\", \"这张流程图是描述什么的\"))"
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-05-26T08:23:33.702969Z",
     "start_time": "2024-05-26T08:23:19.563576Z"
    }
   },
   "id": "5d98bee000b52279",
   "execution_count": 3
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
